task definition
Explainable Rule Application via Structured Prompting: A Neural-Symbolic Approach
Sadowski, Albert, Chudziak, Jarosław A.
Large Language Models (LLMs) excel in complex reasoning tasks but struggle with consistent rule application, exception handling, and explainability, particularly in domains like legal analysis that require both natural language understanding and precise logical inference. This paper introduces a structured prompting framework that decomposes reasoning into three verifiable steps: entity identification, property extraction, and symbolic rule application. By integrating neural and symbolic approaches, our method leverages LLMs' interpretive flexibility while ensuring logical consistency through formal verification. The framework externalizes task definitions, enabling domain experts to refine logical structures without altering the architecture. Evaluated on the LegalBench hearsay determination task, our approach significantly outperformed baselines, with OpenAI o-family models showing substantial improvements - o1 achieving an F1 score of 0.929 and o3-mini reaching 0.867 using structured decomposition with complementary predicates, compared to their few-shot baselines of 0.714 and 0.74 respectively. This hybrid neural-symbolic system offers a promising pathway for transparent and consistent rule-based reasoning, suggesting potential for explainable AI applications in structured legal reasoning tasks.
- Europe > Poland > Masovia Province > Warsaw (0.05)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Metadata-Guided Adaptable Frequency Scaling across Heterogeneous Applications and Devices
Yan, Jinqi, He, Fang, Sang, Qianlong, Tong, Bifeng, Sun, Peng, Gong, Yili, Hu, Chuang, Cheng, Dazhao
Abstract--Dynamic V oltage and Frequency Scaling (DVFS) is essential for enhancing energy efficiency in mobile platforms. However, traditional heuristic-based governors are increasingly inadequate for managing the complexity of heterogeneous System-on-Chip designs and diverse application workloads. Although reinforcement learning approaches offer improved performance, their poor generalization capability and reliance on extensive retraining for each hardware and application combination leads to significant deployment costs. In this work, we observe that device and application metadata inherently encapsulate valuable knowledge for DVFS, presenting an opportunity to overcome these limitations. We formulate DVFS for heterogeneous devices and applications as a multi-task reinforcement learning problem. We introduce MetaDVFS, which is a metadata-guided framework that systematically leverages metadata to discover and transfer shared knowledge across DVFS tasks. Evaluations on five Google Pixel devices running six applications show that MetaDVFS achieves up to 17% improvement in Performance-Power Ratio and up to 26% improvement in Quality of Experience. Compared to state-of-the-art methods, MetaDVFS delivers 70.8% faster adaptation (3.5 1.1 vs. 11.8 5.2 minutes) and 5.8-27.6% These results establish MetaDVFS as an effective and scalable solution for DVFS deployment in heterogeneous mobile environments. Dynamic V oltage and Frequency Scaling (DVFS) is an essential technique for effectively improving energy efficiency in battery-powered mobile platforms. DVFS adjusts the operating voltage and frequency of a device in response to current workload demands [1]. Experimental evaluations report energy savings exceeding 26% on mobile MPSoCs where DVFS functions compared to statically managed systems [2]. Traditional DVFS policies typically rely on heuristic-based governors, such as ondemand and schedutil, which make frequency decisions based primarily on simple utilization metrics. Jinqi Y an, Qianlong Sang, Yili Gong, Chuang Hu, and Dazhao Cheng are with the School of Computer Science, Wuhan University.
- Asia > China > Hubei Province > Wuhan (0.25)
- Europe (0.04)
- Asia > China > Hong Kong (0.04)
- (3 more...)
- Semiconductors & Electronics (1.00)
- Information Technology (1.00)
- Education (1.00)
- Energy (0.88)
Prosocial Behavior Detection in Player Game Chat: From Aligning Human-AI Definitions to Efficient Annotation at Scale
Kocielnik, Rafal, Kim, Min, Penphob, null, Boonyarungsrit, null, Soltani, Fereshteh, Sambrano, Deshawn, Anandkumar, Animashree, Alvarez, R. Michael
Detecting prosociality in text--communication intended to affirm, support, or improve others' behavior--is a novel and increasingly important challenge for trust and safety systems. Unlike toxic content detection, prosociality lacks well-established definitions and labeled data, requiring new approaches to both annotation and deployment. We present a practical, three-stage pipeline that enables scalable, high-precision prosocial content classification while minimizing human labeling effort and inference costs. First, we identify the best LLM-based labeling strategy using a small seed set of human-labeled examples. We then introduce a human-AI refinement loop, where annotators review high-disagreement cases between GPT-4 and humans to iteratively clarify and expand the task definition-a critical step for emerging annotation tasks like prosociality. This process results in improved label quality and definition alignment. Finally, we synthesize 10k high-quality labels using GPT-4 and train a two-stage inference system: a lightweight classifier handles high-confidence predictions, while only $\sim$35\% of ambiguous instances are escalated to GPT-4o. This architecture reduces inference costs by $\sim$70% while achieving high precision ($\sim$0.90). Our pipeline demonstrates how targeted human-AI interaction, careful task formulation, and deployment-aware architecture design can unlock scalable solutions for novel responsible AI tasks.
- North America > United States > California > Los Angeles County > Santa Monica (0.04)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- Europe > Spain > Aragón (0.04)
- (3 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
Yang, Ruihan, Yu, Qinxi, Wu, Yecheng, Yan, Rui, Li, Borui, Cheng, An-Chieh, Zou, Xueyan, Fang, Yunhao, Cheng, Xuxin, Qiu, Ri-Zhao, Yin, Hongxu, Liu, Sifei, Han, Song, Lu, Yao, Wang, Xiaolong
Real robot data collection for imitation learning has led to significant advancements in robotic manipulation. However, the requirement for robot hardware in the process fundamentally constrains the scale of the data. In this paper, we explore training Vision-Language-Action (VLA) models using egocentric human videos. The benefit of using human videos is not only for their scale but more importantly for the richness of scenes and tasks. With a VLA trained on human video that predicts human wrist and hand actions, we can perform Inverse Kinematics and retargeting to convert the human actions to robot actions. We fine-tune the model using a few robot manipulation demonstrations to obtain the robot policy, namely EgoVLA. We propose a simulation benchmark called Ego Humanoid Manipulation Benchmark, where we design diverse bimanual manipulation tasks with demonstrations. We fine-tune and evaluate EgoVLA with Ego Humanoid Manipulation Benchmark and show significant improvements over baselines and ablate the importance of human data. Videos can be found on our website: https://rchalyang.github.io/EgoVLA
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
V-SYNTHESIS: Task-Agnostic Synthesis of Consistent and Diverse In-Context Demonstrations from Scratch via V-Entropy
Wang, Dingzirui, Zhang, Xuanliang, Xu, Keyan, Zhu, Qingfu, Che, Wanxiang, Deng, Yang
High labeling cost for in-context learning (ICL) demonstrations motivates using large language models (LLMs) for synthesis to reduce overhead. However, existing synthesis methods are mainly task-specific or rely on pre-existing demonstrations. So this paper focuses on synthesizing demonstrations from scratch for arbitrary tasks. A major challenge in synthesizing from scratch is ensuring consistency with the target task, as the lack of labeling guidance could lead to synthesis bias. We first propose a consistency metric called V-Score, which has higher performance and lower computation cost compared with the metrics based on grams or embedding vectors. Furthermore, we introduce V-Synthesis, which leverages V-Score for proportional sampling to ensure both high consistency and diversity of synthesized demonstrations. Experimental results demonstrate that V-Synthesis yields an average performance improvement of 2.0% compared to existing synthesis methods confirming the effectiveness of V-Synthesis.
- Asia > Singapore (0.05)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (6 more...)
In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasks
Wang, Dingzirui, Zhang, Xuanliang, Chen, Qiguang, Dou, Longxu, Xu, Xiao, Cao, Rongyu, Ma, Yingwei, Zhu, Qingfu, Che, Wanxiang, Li, Binhua, Huang, Fei, Li, Yongbin
In-context learning (ICL) is an effective approach to help large language models (LLMs) adapt to various tasks by providing demonstrations of the target task. Considering the high cost of labeling demonstrations, many methods propose synthesizing demonstrations from scratch using LLMs. However, the quality of the demonstrations synthesized from scratch is limited by the capabilities and knowledge of LLMs. To address this, inspired by transfer learning, we propose In-Context Transfer Learning (ICTL), which synthesizes target task demonstrations by transferring labeled demonstrations from similar source tasks. ICTL consists of two steps: source sampling and target transfer. First, we define an optimization objective, which minimizes transfer error to sample source demonstrations similar to the target task. Then, we employ LLMs to transfer the sampled source demonstrations to the target task, matching the definition and format of the target task. Experiments on Super-NI show that ICTL outperforms synthesis from scratch by 2.0% on average, demonstrating the effectiveness of our method In-context learning (ICL) is an effective approach for large language models (LLMs) to adapt to various tasks based on the brilliant generalize ability of LLMs (Xun et al., 2017; Song et al., 2023b; Luo et al., 2024a). During the inference with ICL, input not only includes user questions but also several demonstrations to guide LLMs in generating answers correctly. Considering the high cost of demonstration labeling, many methods utilize LLMs to synthesize demonstrations from scratch without human involvement (Kim et al., 2022; Jin & Lu, 2024). For instance, Self-ICL (Chen et al., 2023b) employs LLMs to synthesize demonstration based on the task definition, while Su et al. (2024) improves the synthesis through iterations, where each iteration uses the previous results. However, the synthesis using LLMs from scratch is constrained by the capabilities and knowledge of LLMs, limiting the quality of the synthesized demonstrations (Yu et al., 2023).
GRS: Generating Robotic Simulation Tasks from Real-World Images
Zook, Alex, Sun, Fan-Yun, Spjut, Josef, Blukis, Valts, Birchfield, Stan, Tremblay, Jonathan
We introduce GRS (Generating Robotic Simulation tasks), a novel system to address the challenge of real-to-sim in robotics, computer vision, and AR/VR. GRS enables the creation of digital twin simulations from single real-world RGB-D observations, complete with diverse, solvable tasks for virtual agent training. We use state-of-the-art vision-language models (VLMs) to achieve a comprehensive real-to-sim pipeline. GRS operates in three stages: 1) scene comprehension using SAM2 for object segmentation and VLMs for object description, 2) matching identified objects with simulation-ready assets, and 3) generating contextually appropriate robotic tasks. Our approach ensures simulations align with task specifications by generating test suites designed to verify adherence to the task specification. We introduce a router that iteratively refines the simulation and test code to ensure the simulation is solvable by a robot policy while remaining aligned to the task specification. Our experiments demonstrate the system's efficacy in accurately identifying object correspondence, which allows us to generate task environments that closely match input environments, and enhance automated simulation task generation through our novel router mechanism.
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
Xie, Xudong, Yin, Liang, Yan, Hao, Liu, Yang, Ding, Jing, Liao, Minghui, Liu, Yuliang, Chen, Wei, Bai, Xiang
Document understanding is a challenging task to process and comprehend large amounts of textual and visual information. Recent advances in Large Language Models (LLMs) have significantly improved the performance of this task. However, existing methods typically focus on either plain text or a limited number of document images, struggling to handle long PDF documents with interleaved text and images, especially in academic papers. In this paper, we introduce PDF-WuKong, a multimodal large language model (MLLM) which is designed to enhance multimodal question-answering (QA) for long PDF documents. PDF-WuKong incorporates a sparse sampler that operates on both text and image representations, significantly improving the efficiency and capability of the MLLM. The sparse sampler is integrated with the MLLM's image encoder and selects the paragraphs or diagrams most pertinent to user queries for processing by the language model. To effectively train and evaluate our model, we construct PaperPDF, a dataset consisting of a broad collection of academic papers sourced from arXiv, multiple strategies are proposed to generate automatically 1M QA pairs along with their corresponding evidence sources. Experimental results demonstrate the superiority and high efficiency of our approach over other models on the task of long multimodal PDF understanding, surpassing proprietary products by an average of 8.6% on F1. Our code and dataset will be released at https://github.com/yh-hust/PDF-Wukong.
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Hindi-BEIR : A Large Scale Retrieval Benchmark in Hindi
Acharya, Arkadeep, Murthy, Rudra, Kumar, Vishwajeet, Sen, Jaydeep
Given the large number of Hindi speakers worldwide, there is a pressing need for robust and efficient information retrieval systems for Hindi. Despite ongoing research, there is a lack of comprehensive benchmark for evaluating retrieval models in Hindi. To address this gap, we introduce the Hindi version of the BEIR benchmark, which includes a subset of English BEIR datasets translated to Hindi, existing Hindi retrieval datasets, and synthetically created datasets for retrieval. The benchmark is comprised of $15$ datasets spanning across $8$ distinct tasks. We evaluate state-of-the-art multilingual retrieval models on this benchmark to identify task and domain-specific challenges and their impact on retrieval performance. By releasing this benchmark and a set of relevant baselines, we enable researchers to understand the limitations and capabilities of current Hindi retrieval models, promoting advancements in this critical area. The datasets from Hindi-BEIR are publicly available.
- Asia > India > West Bengal > Kolkata (0.04)
- Asia > India > Gujarat > Gandhinagar (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- (8 more...)